MLLM Post-Training(RL, SFT) Codebase选择

1 评价指标

代码易读性、易修改性

bug多不多，开发者提供的支持怎么样(这方面来说大厂的框架是不是比较好)

模型和算法支持

并行策略

基础的DDP，Deepspeed ZeRO，FSDP，这些现在的框架基本都支持
Tensor Parallel，Pipeline Parallel，Sequence Parallel
- Megatron是最常见的实现
- Deepspeed Ulysses也实现了Sequence parallelism
- 某些框架并没有依赖于上面两个框架的实现，而是自行实现了。
Sequence Packing，避免padding

加速手段

Liger Kernelis a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. That way, we can 4x our context length.
unsloth (Unsloth AI)支持对LoRA和QLoRA的训练加速和高效显存管理，支持Flash Attention。不支持全量微调。
- 使用OpenAI的Triton对模型的计算过程进行重写，大幅提升模型的训练速度，降低训练中的显存占用。Unsloth能够保证重写后的模型计算的一致性，实现中不存在近似计算，模型训练的精度损失为零。Unsloth支持绝大多数主流的GPU设备，包括V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40等，

TRL - Transformer Reinforcement Learning，支持sft，轻量的框架，就是hugginface Transformers Trainer的简单wrapper

兼容unsloth (Unsloth AI) 和 Liger Kernel
分布式训练只支持transformers trainer支持的(DDP, Deepspeed, FSDP)，不支持sequence parallel, tensor parallel, pipeline parallel。因为 Transformers就不支持Huggingface Accelerate 库-分布式训练
- 可以自己通过Deepspeed Ulysses实现？应该比较容易。
开发者支持：应该还可以，毕竟是Hugginface的东西。
支持packing，通过Flash Attention实现的

GitHub - modelscope/ms-swift，阿里搞得，支持的模型特别多，微信群里甚至周末都会回复，支持NPU

评估框架GitHub - modelscope/evalscope: A streamlined and customizable framework for efficient large model evaluation and performance benchmarking来评估，背后的backend是OpenCompass(for LLM)和VLMEvalKit(for MLLM)
并行策略：DDP, device_map, DeepSpeed ZeRO2/ZeRO3, and FSDP.
- 最新版本引入了Megatron，但是这个还不支持多模态模型？可以在微信群问问啥时候能更新。
开发者支持：相当好
兼容Liger Kernel

他们有一个分支在搞Qwen-VL的demo？但是感觉开发很慢。
性能和功能非常强大！
- 支持Ray
- 支持Sequence Parallelism — OpenRLHF 0.6 documentation，通过Ring-Attention实现的
代码门槛可能有点高。不太好改。

Flash attention 2, sequence packing, sequence parallelism support via DeepSpeed Ulysses, LoRA, Liger-kernel.
目前只支持language LLM
[Llama-Factory]的作者基于veRL做了一个多模态版本，叫GitHub - hiyouga/EasyR1: EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
- 但是作者说，不准备提供SFT相关脚本？虽然有SFT功能

支持packing
分布式训练只支持(DDP, Deepspeed, FSDP)，不支持sequence parallel, tensor parallel, pipeline parallel！跟hugginface官方trainer一样。感觉不如用TRL？
未来不会支持RLHF。

GitHub - pytorch/torchtune: PyTorch native post-training library，感觉很不错，可惜不支持多模态(除了他们自家的llama3v)，感觉支持的各种feature也确实很少。

GitHub - yujunhuics/ReyesInternViT + MLP + Qwen2.5，轻量级框架。